Combining Gaussian Mixture Models and Segmental Feature Models for Speaker Recognition

نویسندگان

Milana Milosevic

Ulrike Glavitsch

چکیده

In most speaker recognition systems speech utterances are not constrained in content or language. In a text-dependent speaker recognition system lexical content of speech and language are known in advance. The goal of this paper is to show that this information can be used by a segmental features (SF) approach to improve a standard Gaussian mixture model with MFCC features (GMM-MFCC). Speech features such as mean energy, delta energy, pitch, delta pitch, the formants F1 – F4 and their bandwidths B1 – B4 and the difference between F2 and F1 are calculated on segments and are associated to phonemes and phoneme groups for each speaker. The SF and GMM-MFCC approaches are combined by multiplying the outputs of two classifiers. All the experiments are performed on the two versions of TEVOID: TEVOID16 with 16 and the upgraded TEVOID50 with 50 speakers. On TEVOID16, SF achieves 84.23%, GMM-MFCC 91.75%, and the combined approach gives 95.12% recognition rate. On TEVOID50, the SF approach gives 68.69%, while both GMM-MFCC and the combined model achieve 95.84% recognition rate.. On both databases, the number of male/female confusions decreased for the combined model. These results are promising for using segmental features to improve the recognition rate of textdependent systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subsegmental, Segmental and Suprasegmental Features for Speaker Recognition Using Gaussian Mixture Model

In the feature extraction stage, features representing speaker information are extracted from the speech signal. In the present study LP residual derived from the speech data is used for training and testing and also processing of LP residual in time domain at subsegmental, segmental and suprasegmental levels. In the training phase, GMMs are built, one for each speaker, using the training data ...

متن کامل

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....

متن کامل

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

On the number of Gaussian components in a mixture: an application to speaker verification tasks

Despite all advances in the speaker recognition domain, Gaussian Mixture Models (GMM) remain the state-of-the-art modeling technique in speaker recognition systems. The key idea is to approximate the probability density function ( ) of the feature vectors associated to a speaker with a weighted sum of Gaussian densities. Although the extremely efficient Expectation-Maximization (EM) algorithm c...

متن کامل

Skew Gaussian Mixture Models for Speaker Recognition

The current paper proposes skew Gaussian mixture models for speaker recognition and an associated algorithm for its training from experimental data. Speaker identification experiments were conducted, in which speakers were modeled using the familiar Gaussian mixture models (GMM), and the new skewGMM. Each model type was evaluated using two sets of feature vectors, the mel-frequency cepstral coe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Combining Gaussian Mixture Models and Segmental Feature Models for Speaker Recognition

نویسندگان

چکیده

منابع مشابه

Subsegmental, Segmental and Suprasegmental Features for Speaker Recognition Using Gaussian Mixture Model

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

A Comparative Study of Gender and Age Classification in Speech Signals

On the number of Gaussian components in a mixture: an application to speaker verification tasks

Skew Gaussian Mixture Models for Speaker Recognition

عنوان ژورنال:

اشتراک گذاری